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Abstract. We investigate the statistical properties of votes of customers for spots of France collected by 
the startup company NOMAO. The frequencies of votes per spot and per customer are characterized by 
a power law distributions which remain stable on a time scale of a decade when the number of votes is 
varied by almost two orders of magnitude. Using the computer science methods we explore the spectrum 
and the eigenvalues of a matrix containing user ratings to geolocalized items. Eigenvalues nicely map 
to large towns and regions but show certain level of instability as we modify the interpretation of the 
underlying matrix. We evaluate imputation strategies that provide improved prediction performance by 
reaching geographically smooth eigenvectors. We point on possible links between distribution of votes and 
the phenomenon of self-organized criticality. 

PACS. 89.75.Fb Structures and organization in complex systems - 89.75.He Networks and genealogical 
trees - 89.20.Hh World Wide Web, Internet 


1 Introduction 

The young startup company NOMAO m collected a large 
database about customer (or user) votes for spots (or 
Points of Interest POIs or items) in Erance. The spots 
represent mainly restaurants and hotels with known ge¬ 
olocation coordinates. In this paper we investigate the sta¬ 
tistical properties of these NOMAO votes and ratings of 
geolocalized items in a mix of geographic information and 
recommendations systems. The geographical distributions 
of votes are shown in Eig.[2for the whole Erance and more 
specifically for Paris. The frequency distributions of votes 
per spot and votes per user are shown in Eig. [^for Erance 
at different time intervals. It shows that these frequency 
distributions are stabilized in time and thus we are deal¬ 
ing with an unusual statistical system been at a certain 
steady-state. We note that at present a variety of real 
systems and networks are found to possess power law dis¬ 
tribution (see e.g. [10]) and thus here we investigate a new 
type of such a case with algebraic statistical properties. 

To analyze the statistical properties of this real sys¬ 
tem we use the methods of recommender systems [29] 
which gained a broad recognition in computer science af¬ 


ter the Netflix Prize competition [28]. In our research, 
distance, region and location become a side information 
over a multi-objective classification or regression problem. 
We concentrate on predicting user preferences by a spec¬ 
tral analysis based collaborative filtering that uses geo¬ 
location in addition to the ratings matrix. 

We investigate how user taste, as described by latent 
factors, is reflected in the geographic information system. 
We compare the latent factors obtained by a full spectral 
analysis and by the stochastic gradient method, the stan¬ 
dard recommendation technique [29] applicable for matri¬ 
ces with a very large fraction of missed values. 

The key difficulty in the spectral analysis lies in the 
abundance of missing values in the rating matrix: our ma¬ 
trix consists of 99.5% missing values while the Netflix ma¬ 
trix for example is 99% unknown. Several early results 
describe expectation maximization based singular value 
decomposition (SVD) algorithms, dating back to the sev¬ 
enties m and [51 1^15^ . describe the method for a rec¬ 
ommender application. 

A successful implementation of spectral analysis in rec¬ 
ommender matrices with only a few known elements is de¬ 
scribed by Simon Eunk in m His method is a variant of 
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Fig. 1: Geographical distribution of votes for spots (POIs) 
in the original datasets. Top panel: case of France; (each 
square pixel represents T.Skrn^); bottom panel: case of 
Paris (each square pixel represents 1370m^); color bars 
give a number of votes per pixel (cell), a limitation in 
number of votes is introduced for a better color represen¬ 
tation. 


Stochastic Gradient Descent (SGD) reminiscent of gradi¬ 
ent boosting m- SGD computes no eigenvalues and does 
not guarantee the orthogonality of the matrix factors. On 
the other hand, regularization is easily incorporated in 
SGD, which enables a better handling of the very large 
amount of missing values in the matrix and in particular, 
prevents overfit to training elements and provide better 
quality predictions of the unknown ratings. 

In this paper, after describing methods and related re¬ 
sults (Section 2) and the NOMAO data sets (Section 3), 
we compare and visualize the geo-localization of the ma¬ 
trix factors defined by SVD and SGD under various pa¬ 
rameter settings in Section We show that by imputing 



Fig. 2: Differential frequency distributions of votes for the 
case of France for different time intervals until year 2004, 
2006, 2008, 2010, 2012. Top panel: differential probabil¬ 
ity P{iy) to have u votes for spots (POIs); bottom panel: 
differential probability P(/i) to have /i votes per user (cus¬ 
tomer). Here the dashed lines show an average algebraic 
decay with exponents a = 1.5 (top), b = 2.75 (bottom). 


ratings to nearby locations we may form factors that yield 
a better description of the ratings matrix in Section 


2 Methods and related results 

Recommenders based on the rank k approximation of the 
rating matrix with the first k singular vectors are probably 
first described in [5112111151122] and many others near year 
2000. 

The Singular Value Decomposition (SVD) of a rank p 
matrix R is given hy R = with U an m x p, V 

a p X p and V an n x p matrix such that U and V are 
orthogonal. By the Eckart-Young theorem El the best 
rank-/c approximation of R with respect to the Frobenius 
norm is 

||i? - C/J SkVkWp = ~'P,<^kUkiVkjf‘, (1) 

ij k 
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where is an m x /c and V/e is an n x /c matrix contain¬ 
ing the first k columns of U and V and the diagonal 
containing first k entries of U. 

The RMSE differs from the above equation only in that 
summation is over known ratings 

RMSE^ = eriij where err^ = - '^akUkiVkj. 

ij G known k 

( 2 ) 

Early works [22] used SVD for recommenders by defin¬ 
ing various strategies for handling the missing values in the 
rating matrix R [T8|. The most natural idea is to impute 
the missing elements by zeroes, averages, or even repeat¬ 
edly re-fill by predictions. It has turned out that all above 
missing value imputation methods overfit to the imputed 
values HU. More recent results emphasize the importance 
of regularization to avoid overfitting piEH] . For this rea¬ 
son, the recommender systems community turned away 
from SVD and use other optimization methods for rat¬ 
ing matrices with missing values, most notably stochastic 
gradient descent [26] and alternating least squares |16| . 

In our problem, locality is an additional information 
that can be exploited for analyzing the recommender ma¬ 
trix. Surveys on recommendations in location-based so¬ 
cial networks nm] combine spatial ratings for non-spatial 
items, nonspatial ratings for spatial items, and spatial rat¬ 
ings for spatial items m- Flickr geotags are used for travel 
route recommendation, concentrating on routes and not 
individual places in m User similarity based methods 
may combine friendship information with the distance of 
the user home locations [3I1I32]. 

Most similar to our method is the Probabilistic Matrix 
Factorization approach that fuses geographic information 
[7] and observes that “users tend to check in around sev¬ 
eral centers, where the check-in locations follow a Gaus¬ 
sian distribution at each center [... and] the probability of 
visiting a place is inversely proportional to the distance 
from its nearest center; if a place is too far away from the 
location a user lives, although he/she may like that place, 
he/she would probably not go there.” 

3 The Nomao Datasets 

Nomao is a startup company located in France [20]. It 
performs the analysis of point of interest (POI) rating and 
reservation services and collects POI information includ¬ 
ing user ratings from France with a special accent on Paris 
and Toulouse regions where the company headquarters are 
located. The Nomao dataset used in our experiments con¬ 
tain user-POI ratings, and GPS information of the rated 
POIs. We investigate two separated datasets. The first one 
contains information on POIs in France, while the second 
has ratings only on POIs located in Paris. We analyze the 
datasets collected during the time period up to year 2012. 

Table (top) shows the basic attributes of the orig¬ 
inal datasets. The average number of ratings per item is 
relatively large, the average number of ratings per user is 
very low. Moreover, only a very few percent of all user- 
item scores is known. 


Table 1: Attributes of the original (top), and cleaned 
(bottom) datasets. 


original 

Paris 

France 

Number of ratings 

1,539,964 

1,432,601 

Number of users 

998,127 

1,077,568 

Number of items 

20,576 

99,976 

Average ratings per user 

1.543 

1.329 

Average ratings per item 

74.84 

14.32 

Ratio of known ratings 

0.0075% 

0.0013% 

cleaned 

Paris 

France 

Number of ratings 

114,352 

97,452 

Number of users 

5,756 

9,471 

Number of items 

2,952 

7,605 

Average ratings per user 

19.87 

10.29 

Average ratings per item 

38.74 

12.81 

Ratio of known ratings 

0.672% 

0.135% 

Average rating 

3.714 

3.747 


The distribution P{iy) of frequency of votes per spot u 
(or item i) for France dataset is shown in the top panel of 
Fig.H This distribution is stable in time and is well de¬ 
scribed by the power law P{k') oc 1/z/^ with a ^ 1.5. Also, 
the distribution P(/i) of frequency of votes per customer 
/i (or user u) remains stable in time with the power law 
dependence P(/i) oc l//i^ with b ^ 2.75. It is important 
to note that these distributions remain stable from year 
2004 up to year 2012 even if the number of votes increases 
almost by two orders of magnitude during this period. At 
the moment we cannot provide theoretical reasons for the 
values of these exponents. 

We call user activity how many times a user scored 
different items. We define item activity similarly. Fig. 
shows the probability density function (PDF), an the cu¬ 
mulative density function (CDF) of user activities. Fig. 
shows the same distributions for items. Both user and item 
activities follow power-law distributions with the exponent 
values being very similar for France and Paris datasets. As 
in Fig.j^we find that the exponent for probability of votes 
for POIs is a ~ 1.5 while the exponent for the exponent 
for probability of votes of users is b ~ 2.75. 

To handle the extreme sparsity of the user-item ma¬ 
trices, we selected a smaller subset of the user-item rating 
datasets by the following selection criteria: 

— We only used ratings between 0-5. Part of the ratings, 
probably originating from a different system, were out 
of this range. 

— We filtered out users and items that have less than A 
ratings. In other words, we selected the subgraph of the 
user-item rating bipartite graph with users and items 
that have degree at least A. For Paris we set A = 10, 
for France we set A = 5. 

Table (bottom) shows the attributes of the selected 
subsets. In what follows we use these datasets in our ex¬ 
periments. 

In Fig.j^we show the score distributions: the top (bot¬ 
tom) panel shows the distributions for the original (cleaned) 
datasets. We see that the original and cleaned datasets 
have similar distributions of scores. 
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Fig. 3: Probability density function (top), and cumulative 
density function (bottom) of user activities in the original 
datasets. 
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Fig. 5: Score distributions with integer binning. Top: 
original dataset, Bottom: cleansed dataset. 


4 Spectra of recommender matrices 


4.1 Singular value decomposition 



frequency x 



Fig. 4: Probability density function (top), and cumula¬ 
tive density function (bottom) of item activities in the 
original datasets. 


Fig.j^shows the geographical density of POIs in France 
(top) and Paris (bottom) for the original datasets. The 
geolocation data of POIs are used in the following Sections 
for spectral analysis. 

In the following we perform all computations with the 
cleaned datasets since the analysis of multiple votes of the 
same user provides more reliable statistical data. 


The recommender matrix R consists of the preference val¬ 
ues r(i4, i) of users u for items i. The values may denote ex¬ 
plicit rating values, e.g. 1-5 stars for Netflix movies [3]. We 
may also consider the so-called implicit ratings problem, 
where the value is 1 if the user visited POI i and 0 other¬ 
wise. The value of the explicit matrix is missing whenever 
the user gave no rating yet. In most of the cases, this ma¬ 
trix is very sparse with only 1% or less known values. The 
implicit matrix is always a full 0-1 matrix, however the 
0 values are uncertain: the user may not know about the 
item or had no time yet to visit it. 

The so-called Latent Factor Model is an approximation 
R of the original rating matrix i?, 

k 

f(u,a) = '^p^fqaf, (3) 

/=i 

where P = [puf] and Q = [puf] are the user and item 
factor models, respectively. 

For a fixed number of factors /c, r approximates r with 
the smallest root mean squared error if it is defined by the 
singular vectors corresponding to the k largest singular 
values, 

k 

f{u,a) = '^Pufqaf, (4) 

/=i 

where the singular value decomposition (SVD) of R is 
UEV'^. 

Since 


RR^ = and R^R = VE'^V^, 


(5) 
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Fig. 6: Geographical distribution of POIs in the original 
datasets. Top panel: case of France (each square pixel rep¬ 
resents T.Skrn^); bottom panel: case of Paris (each square 
pixel represents 1370m^); color bars give number of POIs 
per pixel (cell); a limitation in number of POIs is intro¬ 
duced for a better color representation. 


the spectrum of the recommender matrix R is defined 
identically by the square root of the eigenvalues of RR^ or 
R^R. These latter matrices are symmetric positive semidef- 
inite, the spectrum is non-negative real. 

If R contains missing values such as in the case of an 
explicit rating matrix, SVD is undefined. We may still de¬ 
fine the best root mean square approximation by summing 
the error for the known ratings only as in equation 0. 


model 

k 

r{u,i) = '^Puiqii, (6) 

1 = 1 

where p and q contain the user and item models, respec¬ 
tively. By adding regularization with weight A, we opti¬ 
mize the quantity 

For a single event {u^i) we optimize the coefficients Pui 
and Qii for / = 1, ..., k hy gradient descent with learning 
rate Irate as 

Pul ^ Pul + Fate • (r{u, i) - Y!i=i PuiQu^ Qu - Fate • Xpui; (8) 
qu ^ qu + Fate • (r{u, i) - Ya=i Puiqii^ Pul - Fate • Xqu. (9) 

Unlike SVD where eigenvalues are sorted, the SGD 
factors are not ordered by the above equations. In order 
to produce the eigenvector maps, we built ranked factors 
by an iterative SGD that optimize only on a single factor 
at a time |12| . 


4.3 Mapping SVD and SGD latent factors 

First we set each unknown value of R to zero and com¬ 
puted the SVD decomposition. The first, second, and fourth 
singular vectors are plotted over the map of France (Fig.[^ 
left) by assigning the value in the vector to the location 
of the POL More specifically, we averaged these values on 
a grid to create the final heat maps. The smoothing algo¬ 
rithm weighted the value of each POI to the closest grid 
point inversely proportional to their euclidean distance. 

The heatmaps in Fig. left, indicate that the singu¬ 
lar vectors are strongly geolocation related. The first few 
dimensions correspond to the largest cities in France. 

Similarly, we investigated the latent vectors of R com¬ 
puted with the SGD algorithm. The first, second and fourth 
latent vectors are plotted over the map of France in Fig.[^ 
right, similar to the SVD eigenvectors. While the SVD sin¬ 
gular vectors were centralized one-by-one on a large city, 
the SGD latent factors are the linear combination of them. 
The latent factors are also geolocation related, but not 
separated among the main cities like the SVD singular 
vectors. 

Inj^we mapped the first three singular vectors of the 
Paris dataset. The different vectors may focus on different 
districts. However, they are not as clearly separated as the 
singular vectors of the France dataset. 

In Section we use these key observations to improve 
the recommendation quality of the SGD. 


4.2 Stochastic gradient descent: Latent factor 
modeling with missing values 

We use the regularized matrix factorization method of ESI 
and optimize the minimum squared error of the /c-factor 
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Fig. 7: The first, second and fourth singular vectors of the Nomao France rating matrix by SVD (left) and SGD 
(right). Here, the SVD eigenvectors correspond to Paris and Toulouse; Bordeaux, Toulouse and Marseille; Bordeaux 
and Marseille respectively, while the SGD plots for respective vectors are scattered around several cities. 
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5 Prediction for ratings and visits to 
locations 

5.1 Recommender evaluation 


Recommender systems serve to find new products for the 
users that are relevant for them. More specifically, for a 
given user u an item i a recommender system may retrieve 
the predicted relevancy Vui- This is called the rating pre¬ 
diction task. The Netfiix Prize competition [4] was a chal¬ 
lenge in rating prediction. While in the Netfiix Prize com¬ 
petition, contestants were optimizing to predict all ratings 
to the users, a recommender system in practice selects the 
top rated items for a given user. In this top-R" prediction 
task [9l [8ll33] . a recommender system should retrieve for a 
given user a top list of items with length K. The top list 
should contain the most relevant items for the given user. 
This problem is more application related than the rat¬ 
ing prediction task. In our experiments we examine both 
problems on the NOMAO datasets. 

In addition to RMSE defined by equations Q for full 
and © for partial matrices, we use two measures that 
evaluate the accuracy of the top-K recommendation task. 

Recall at k is defined as the number of relevant POIs 
among the highest k values of row u in the matrix approx¬ 
imation. 


Recall (/c) = 




^rel^ 


( 10 ) 


where rel^^^^ is the actual relevance of POI i for user u 
in the evaluation data, and Ru is the number of relevant 
items for user u in the dataset. We may average for all 
users to obtain 


Recall (/c) = 


1 

W\ 


Recall^t(/c). 

u 


( 11 ) 


Normalized Discounted Cumulative Gain at k weights the 
relevance by the order of the predicted values as 


where 


NDCG„(A:) = t 


DCG„(fc) 


iDCG„(fc)’ 


BCGuik) = 


rel,. 


log2{« + 1) 


( 12 ) 

(13) 


and 

NDCG(fc) = A E NDGG„(fc). (14) 

^ ^ U 

In our experiments, we randomly cut the data to train¬ 
ing and test sets. We only use records in the training set 
to set the parameters of our model. The lower MSE, and 
the higher NDCG and recall we measure on the test set, 
the better is our model. 




I 


0.02 

0.01 

0 

- 0.01 

- 0.02 

- 0.03 

- 0.04 

- 0.05 

- 0.06 

- 0.07 

- 0.08 



I 


0.1 

0.05 


0 


- 0.05 


- 0.1 


- 0.15 


Eig. 8: The first, third and fourth singular vectors of the 
Nomao Paris rating matrix obtained by SVD. 


5.2 The rating prediction task 


As indicated in Table bottom, and in Eig.|^ the scores 
have a peaked distribution. This indicates first that the 
rating prediction task makes less sense with these datasets. 
We trained up an SGD recommender by using 50% of the 
datasets and computed NDGG(k) for = 1... 20. To un¬ 
derstand the performance of the model, we also measured 
the performance of a random recommender that predicts 
ratings uniform randomly. We repeated our experiments 
10 times with 10 different random training and test sets. 
Fig-i shows the computed ten performance curves for 
the SGD and the baseline random recommendation. Both 
for SGD and the random prediction the ten curves are 
similar. This indicates the stability of our algorithms and 
evaluation metrics. We achieved significantly better result 
with the SGD recommender. However for the random al¬ 
gorithm, the baseline NDCG is around 0.85. This is due 
to the fact that most of the ratings are around the mean 
as the score distribution is peaked. 
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Fig. 9: Performance of score prediction on the France 
(top) and Paris (bottom) datasets. 



Fig. 10: Recall© 100 and NDCG©100 for expansion with 
the original ratings. 


5.3 Improving top recommendation with rating 
imputation 

Instead of simply recommending locations near to already 
visited places, we expand the training set by relying on 
the locality of the ratings. We compare our results by 
using SVD or SGD both for the rating matrix and for 
simply predicting the visits, i.e. the existence of a rat¬ 
ing regardless of its value. When considering locality, we 
may identify the nearest neighbors by taking the absolute 
distance and possibly correcting by density: in an area 
densely served by POIs, customers may reach more loca¬ 
tions, on the other hand, the speed of travel is likely lower 
than in rural areas. 

For our imputation methods, let E be the set of known 
ratings and Nj the neighbors of location j. We modify the 
training set as follows. For all (u^i)^ 


where we observe that expansion by the 30-40 nearest 
POIs improves significantly the matrix approximation by 
the first few eigenvectors. 

We may also consider the task of predicting which 
POIs the user will visit, regardless of the actual rating 
given by the user. In this so-called implicit recommenda¬ 
tion task, we consider a O-I matrix. Although the matrix 
is fully known, the meaning of a “I” is certain while a 
“0” may simply mean that the user has not yet had a 
chance to visit the POI or does not even know about it. 
Based on (16), the performance of the implicit task with 
expansion tor the France dataset is seen in Fig. [^show¬ 
ing an improvement compared to Fig. However, for 
the Paris dataset, both in case of the ratings and implicit 
expansion experiments, we could not improve further the 
original SVD. This can be due to the fact that the Paris 
dataset is more dense geographically. 


ru,i 


ru,i if (w, i) E E 

f{Ru, if (w, i) ^ E and for some (u, j) G E and i E Nj (15) 

missing otherwise, 


5.4 Improving recommendation with fixed factors 


where / is function of Ru, the set of known ratings by 
user 14, and Nu^i, the set locations visited by u in the 
neighborhood of i. 

In our model, we expand the list of locations per user 
with the neighbors of visited places by the two strategies: 


Constant: 

f{Ru,Nu,i)=C 

Ratings Average: 


f{Ru,Nu,i) 


1 

WE\ 




(16) 

(17) 


The performance for expansion with the original rat¬ 
ings (see ( pT| )) on the France dataset is seen in Fig. 10 


Results of Fig. [^indicate that while SGD finds the most 
important cities in France, it can not separate them pre¬ 
cisely. Furthermore, not recommending to a user POIs, 
that he/she have not visited, can be easily implemented 
without using SGD. Indeed, SGD should learn the taste 
of the different users like in case of the movie prediction 
task of Netflix. To fix this issue in the France dataset, we 
selected the top t cities in France. For a given item, we 
fixed the first ith factor to I, if the item is located in the 
ith city, and 0 otherwise. We set the user factors similarly 
according to the places visited by the user in the test set. 
We then trained a k dimensional latent factor model where 
we updated only the remaining k — t dimensions. We com¬ 
pared this recommender with a traditional k dimensional 
SGD recommender. 
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Fig. 11: Recall@100 and NDCG@100 for expansion nearby 
visited locations. 



Fig. 12: Improved MSE results with fixed factors on the 
France dataset. 


of self-organized criticality broadly discussed in physical 
systems (see e.g. [ni27[l3Q| ). It is interesting to note that 
the exponent of cluster distribution in self-organized crit¬ 
ical models in 3D has an exponent close to 1.4 [T] being 
not so far from the exponent a = 1.5 we find for spots. 

We explored the spectrum and the singular vectors of a 
POI ratings matrix of customer votes for spots of France. 
The fact that the matrix consists of 99.5% missing values 
makes the spectrum highly dependent on how we handle 
the missing values. We computed the SVD of the full 0- 
1 “implicit” matrix of the visits without considering the 
rating. For the ratings matrix, we used SGD, a popular 
approach that uses only the known values to compute the 
factors. We observed that SGD and SVD factors are sim¬ 
ilar but SVD has stronger geo-localization. SVD singu¬ 
lar vectors with highest eigenvalues are mostly correlated 
with a particular place. As key practical observations, we 
found that imputing the missing ratings for the neighbors 
of visited places could increase the performance, and that 
defining fixed Geographic factors could improve SGD rec¬ 
ommendation quality. 

We expect that a broader analysis of a larger number 
of similar type datasets of votes will allow to gain better 
understanding of underlying physical process and provide 
better recommendations for specific customers and spots. 
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In our experiments we included the top t = 10 cities, 
in order Paris, Marseille, Lyon, Toulouse, Nice, Nantes, 
Strasbourg, Montpellier, Bordeaux, and Lille. Eig.p!^ shows 
the MSE on the test set as the function of the number of 
iterations on the training set. With the fixed factor model 
we can achieve significantly better MSE. Eurthermore, our 
best result is achieved with half less iterations compared 
to the number needed to train the original latent factor 
model. 


6 Discussion 

Our statistical analysis of NOMAO votes of customers for 
spots of Erance shows that it is described by a power law 
frequency distributions with exponents a ~ 1.5 (for spots) 
and 6 ~ 2.75 (for customers) which remain stable in time 
even is the number of votes is increased almost by two or¬ 
ders of magnitude during this time period. Eurther stud¬ 
ies are required to establish the physical origins of such 
laws and to clarify for universal they are. It is possible 
that the physical reasons for emergence of such type dis¬ 
tributions have certain similarities with the phenomenon 
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